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Abstract 

Let W be a random variable with mean zero and variance a 2 . Lhe distribution 
of a variate W*, satisfying EWf(W) = a 2 Ef'(W*) for smooth functions /, exists 
iy-^ ~ uniquely and defines the zero bias transformation on the distribution of W. The 

zero bias transformation shares many interesting properties with the well known 
size bias transformation for non-negative variables, but is applied to variables taking 
on both positive and negative values. The transformation can also be defined on 
more general random objects. The relation between the transformation and the 
expression wf'(w) — <r 2 f"{w) which appears in the Stein equation characterizing the 
mean zero, variance a 2 normal aZ can be used to obtain bounds on the difference 
E{h(W / a) — h(Z)} for smooth functions h by constructing the pair (W, W*) jointly 
on the same space. When W is a sum of n not necessarily independent variates, 
Oh under certain conditions which include a vanishing third moment, bounds on this 

Oh \ difference of the order 1/n for classes of smooth functions h may be obtained. The 

4^ ' technique is illustrated by an application to simple random sampling. 

1 Introduction 

Since 1972, Stein's method [H] has been extended and refined by many authors and 
ON , has become a valuable tool for deriving bounds for distributional approximations, 

in particular, for normal and Poisson approximations for sums of random variables. 
(In the normal case, see, for example, Ho and Chen Stein E3> Barbour 
[2] , Gotze ^0] j Bolthausen and Gotze [H] , Rinott , and Goldstein and Rinott [0] ) . 
Through the use of differential or difference equations which characterize the target 
distribution, Stein's method allows many different types of dependence structures 
to be treated, and yields computable bounds on the approximation error. 

The Stein equation for the normal is motivated by the fact that W ~ Af(n, cr 2 ) 
if and only if 
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E{(W- n)f'(W) - a 2 f"(W)} = for all smooth /. 



S 

> 

X 

Given a test function h, let $/i = Eh(Z) where Z ~ JV(0, 1). If W is close to 
M(fJL, a 2 ), Eh((W — /u)/cr) — <3?/i will be close to zero for a large class of functions h, 
and E { (W — fi)f'(W) — <r 2 /"(VL)} will be close to zero for a large class of functions 
/. It is natural then, given h, to relate the functions h and / through the differential 
equation 

(x - - a 2 f"(x) = h((x - fj,)/a) - Qh, (1) 

and upon solving for /, compute Eh((W-fi)/a)-$h by E {(W - fi)f'(W) - a 2 f'(W)} 
for this /. A bound on Eh((W — jj)/<j) — §h can then be obtained by bounding the 
difference between E(W - fJ,)f'(W) and a 2 Ef"{W). 

Stein ^Hl) Baldi, Rinott, and Stein P, and Goldstein and Rinott 0, among 
others, were able to exploit a connection between the Stein equation and the 
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size biasing of nonnegative random variables. If W > has mean < EW = p, < oo, 
we say W s has the TV-size biased distribution if for all / such that EW f{W) exists, 

EWf(W) = pEf(W s ). (2) 

The connection between the Stein equation and size biasing is described in Goldstein 
and Rinott [§]• In brief, one can obtain a bound on Eh((W— fj,)/a) — <&h in terms of a 
pair (W, W s ), coupled on a joint space, where W s has the TV-size biased distribution. 
The terms in this bound will be small if W and W s are close. The variates W 
and W s will be close, for example, when W = X\ + • • • + X n is the sum of i.i.d. 
random variables, as then W s can be constructed by replacing a single summand 
Xi by an independent variate Xf that has the Xj-size biased distribution. Similar 
constructions exist for non-identically distributed and possibly dependent variates, 
and are studied in [S]. 

As noted in [2], the size biasing method works well for combinatorial problems 
such as counting the number of vertices in a random graph having prespecified 
degrees. When the distributions approximated are counts, size biasing is natural; 
in particular, the counts W are necessarily nonnegative. To size bias a W which 
may take on both positive and negative values, it may be that for some p, W + p or 
— W + p is a nonnegative random variable whose mean exists. Yet if W has support 
on both the infinite positive and negative half lines then some truncation must be 
involved in order to obtain a nonnegative random variable on which the size bias 
transformation can be performed. This is especially unnatural if W is symmetric, 
as one would expect that W itself would be closer to normal than any version of 
itself involving translation and truncation. 

The transformation and associated coupling which we study here has many sim- 
ilarities to the size biasing approach, yet it may be applied directly to mean zero 
random variables and is particularly useful for symmetric random variables or those 
with vanishing third moment. The transformation is motivated by the size bias 
transformation and the equation that characterizes the mean zero normal: 

Z ~ Af(0, a 2 ) if and only if EWf{W) = a 2 Ef'(W). (3) 

The similarity of the latter equation to equation © suggests, given a mean zero 
random variable W, considering a new distribution related to the distribution of W 
according to the following definition. 

Definition 1.1 Let W be a mean zero random variable with finite, nonzero variance 
a 2 . We say that W* has the W -zero biased distribution if for all differentiate f for 
which EW f(W) exists, 

EWf{W) = a 2 Ef'(W*). (4) 

The existence of the zero bias distribution for any such W is easily established. 
For a given g £ C c , the collection of continuous functions with compact support, let 
G = g. The quantity 

Tg = <j- 2 E{WG{W)} 

exists since EW 2 < oo, and defines a linear operator T : C c — > R. To see moreover 
that T is positive, take g > 0. Then G is increasing, and therefore W and G(W) 
are positively correlated. Hence EWG(W) > EWEG(W) = 0, and T is positive. 
Now invoking the Riesz representation theorem (see, eg. [7]), we have Tg = J gdv 
for some unique Radon measure v, which is a probability measure by virtue of 
Tl = 1. In fact, the TV-zero biased distribution is continuous for any nontrivial W; 
the density of W* is calculated explicitly in Lemma l2.1[ part |2j). 
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Definition 11.11 describes a transformation, which we term the zero bias transfor- 
mation, on distribution functions with mean zero and finite variance. However, for 
any W with finite variance we can apply the transformation to the centered variate 
W — EW . 

The zero bias transformation has many interesting properties, some of which we 
collect below in Lemma 12. II In particular, the mean zero normal is the unique fixed 
point of the zero bias transformation. Prom this it is intuitive that W will be close 
to normal in distribution if and only if W is close in distribution to W*. 

Use of the zero bias method, as with other like techniques, is through the use of 
coupling and a Taylor expansion of the Stein equation; in particular, we have 

E[Wf'(W) - a 2 f"(W)} = a 2 E[f"(W*) - f"(W)], 

and the right hand side may now immediately be expanded about W. In contrast, 
the use of other techniques such as size biasing requires an intermediate step which 
generates an additional error term (e.g., see equation (19) in [H]). For this reason, 
using the zero bias technique one is able to show why bounds of smaller order than 
for smooth functions h may be obtained when certain additional moment 
conditions apply. 

For distributions with smooth densities, Edgeworth expansions reveal a similar 
phenomenon to what is studied here. For example, (see Feller jBj), if F has a density 
and vanishing third moment, then an i.i.d. sum of variates with distribution F has 
a density which can be uniformly approximated by the normal to within a factor of 
1/n. However, these results depend on the smoothness of the parent distribution 
F. What we show here, in the i.i.d. case say, is that for smooth test functions h, 
bounds of order 1/n hold for any F with vanishing third moment and finite fourth 
moment (see Corollary IH.1JI . 

Generally, bounds for non-smooth functions are more informative than bounds 
for smooth functions (see for instance Gotze ^H], Bolthausen and Gotze jSj, Rinott 
and Rotar ^31 an d Dembo and Rinott [5]); bounds for non-smooth functions can be 
used for the construction of confidence intervals, for instance. Although the zero bias 
method can be used to obtain bounds for non-smooth functions, we will consider 
only smooth functions for the following reason. At present, constructions for use of 
the zero bias method are somewhat more difficult to achieve than constructions for 
other methods; in particular, compare the size biased construction in Lemma 2.1 of 
[H] to the construction in Theorem 12 . II here . Hence, for non-smooth functions, other 
techniques may be easier to apply. However, under added assumptions, the extra 
effort in applying the zero bias method will be rewarded by improved error bounds 
which may not hold over the class of non-smooth functions. For example, consider 
the i.i.d. sum of symmetric +1, —1 variates; the bound on non-smooth functions of 
order l/i/n is unimprovable and may be obtained by a variety of methods. Yet a 
bound of order 1/n holds for smooth functions, and can be shown to be achieved by 
the zero bias method. Hence, in order to reap the improved error bound benefit of 
the zero bias method when such can be achieved, we restrict attention to the class 
of smooth functions. 

Ideas related to the zero bias transformation have been studied by Ho and Chen 
jllj . and Cacoullos et al. [2]. Ho and Chen consider the zero bias distribution 
implicitly (see equation 1.3 of i n their version of one of Stein's proofs of the 
Berry Esseen theorem. They treat a case with a W the sum of dependent variates, 
and obtain rates of 1/ y/n for the L p norm of the difference between the distribution 
function of W and the normal. 
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The approach of Cacoullos et al. [I] is also related to what is studied here. In 
the zero bias transformation, the distribution of W is changed to that of W* on the 
right hand side of identity (jSJ) , keeping the form of this identity, yielding (pE|). In 0] , 
the distribution of W is preserved on the right hand side of (jHJ) , and the form of the 
identity changed to E\Wf(W)] = a 2 E[u(W) f (W)], with the function u determined 
by the distribution of W. Note that both approaches reduce to identity ® when 

W is normal; in the first case W* = W, and in the second, u(w) = 1. 

The paper is organized as follows. In Section 2, we present some of the properties 
of the zero bias transformation and give two coupling constructions that generate 
W and W* on a joint space. The first construction, Lemma 12.11 part is for 
the sum of independent variates, and its generalization, Theorem 12. 1| for possibly 
dependent variates. In Section 3, we show how the zero bias transformation may 
be used to obtain bounds on the accuracy of the normal approximation in general. 
In Section 4, we apply the preceding results to obtain bounds of the order 1/n 
for smooth functions h when W is a sum obtained from simple random sampling 
without replacement (a case of global dependence), under a vanishing third moment 
assumption. 

2 The Zero Bias Transformation 

The following lemma summarizes some of the important features of the zero bias 
transformation; property (@J) for n = 1 will be of special importance, as it gives that 
EW* = whenever EW 3 = 0. 

Lemma 2.1 Let W be a mean zero variable with finite, nonzero variance, and let 
W* have the W -zero biased distribution in accordance with Definition \l.l\ Then, 

1. The mean zero normal is the unique fixed point of the zero bias transformation. 

2. The zero bias distribution is unimodal about zero and continuous with density 
function p(w) = a~ 2 E{W,W > w]. It follows that the support of W* is the 
closed convex hull of the support of W and that W* is bounded whenever W 
is bounded. 

3. The zero bias transformation preserves symmetry. 

4. a 2 E(W*) n = EW n+2 /(n + 1) forn > 1. 

5. Let X\ , . . . , X n be independent mean zero random variables with EX 2 = af . 
Set W = X\ + • • - + X n , and EW 2 = a 2 . Let I be a random index independent 
of the X's such that 

P(I = i) = a 2 ja 2 . 

Let 

Wi = W -Xi = J2 x i- 

Then Wj + Xj has the W -zero biased distribution. (This is analogous to 
size biasing a sum of non-negative independent variates by replacing a variate 
chosen proportional to its expectation by one chosen independently from its 
size biased distribution; see Lemma 2.1 in 

6. Let X be mean zero with variance o\ and distribution dF . Let (X',X") have 
distribution 

(r> _ r"} 2 
dF(x',x") = i= tJ-dF{x')dF{x"). 
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Then, with U an independent uniform variate on [0, 1], UX' + (1 — U)X" has 
the X-zero biased distribution. 

Proof of claims: 

1. This is immediate from Definition 11.11 and the characterization j3J). 

2. The function p(w) is increasing for w < 0, and decreasing for w > 0. Since 
EW = 0, p(w) has limit at both plus and minus infinity, and p(w) must 
therefore be nonnegative and unimodal about zero. That p integrates to 1 
and is the density of a variate W* which satisfies (@J follows by uniqueness 
(see the remarks following Definition 11.1 J) , and by applying Fubini's theorem 
separately to E[f'(W*);W* > 0] and E[f(W*);W* < 0], using 

E[W; W > w] = -E[W; W < w], 

which follows from EW = 0. 

3. If w is a continuity point of the distribution function of a symmetric W, then 
E[W;W > w] = E[-W;-W > w] = -E[W;W < -w] = E[W;W > -w] 
using EW = 0. Thus, there is a version of the dw density of W* which is the 
same at w and — w for almost all w [dw]; hence W* is symmetric. 

4. Substitute w n+1 /(n + 1) for f(w) in the characterizing equation @. 

5. Using independence and equation @ with X; L replacing W, 

a 2 Ef'(W*) = EWf{W) 

n 

= J^EXifiW) 

i=i 



Y^ExfEfiWi + x; 



1=1 

n o 

i 

a 2 



v 2 Y,^Ef(w t + x; 

1=1 

a 2 Ef'(W I + X* I ). 



Hence, for all smooth /, Ef(W*) = Ef(Wi + Xf), and the result follows. 
6. Let X',X" denote independent copies of the variate X. Then, 

o\Ef\UX> + (1 - U)X") = a\E ( ^Z^P ) 

= ±E{X'-X"){f(X')-f{X")) 

= EX'f(X')-EX"f(X r ) 
= EXf(X) 
= a 2 x Ef'(X*). 



Hence, for all smooth /, Ef(UX' + (1 - U)X") = Ef'(X*). 
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□ 

By of Lemma 12.11 the mean zero normal is a fixed point of the zero bias 
transformation. One can also gain some insight into the nature of the transformation 
by observing its action on the distribution of the variate X taking the values —1 and 
+1 with equal probability. Calculating the density function of the X-zero biased 
variate X* according to ((21) of Lemma l2.ll we find that X* is uniformly distributed 
on the interval [—1,1]. A similar calculation for the discrete mean zero variable 
X taking values x\ < X2 < • • • < x n yields that the X-zero biased distribution 
is a mixture of uniforms over the intervals [x^, Xi+i]. These examples may help in 
understanding how a uniform variate U enters in © of Lemma 12.11 

For a construction of W and W* which may be applied in the presence of depen- 
dence, in the remainder of this section, we will consider the following framework. 
Let Xi, . . . , X n be mean zero random variables, and with W = X\ + • • • + X n , sup- 
pose EW 2 = <7 2 exists. For each i = 1, . . . ,n, assume that there exists a distribution 
dF nt i(xi, . . . , Xi-ijx'^x", X{+i, ... ,x n )on n+1 variates Xi,... , Xi-i, X[, X", X i+1 , . . 
such that 

(Xi, . . . , X'i, X" , . . . , Xn) = (Xi, . . . , Xi-i,X'/, X[, Xi+i, . . . , X n ), 

(5) 

and 



(X\, . . . Xi, Xi+i, ■ ■ ■ , X n ) — (X\,...,Xi-\,X l i ,Xi-\-\,...,X n ). (6) 

(The choice X[ = Xi is natural, and then (jBJ) is satisfied.) 

Further, we will suppose that there is a p such that for all / for which EWf(W) 
exists, 

n 

J2EXlf(W t + X>') = P EWf(W), (7) 
where Wi = W — Xi. We set 

v\ = E{X' l -X'lf. (8) 
Under these conditions, we have the following proposition. 
Proposition 2.1 

1 n 

i=i 

Before proving this proposition, note that if a collection of variates already 
satisfies © and ©, and that if for each i, 

E{Xl\Wi + Xi'} = ^W i + X('), (9) 

then 

EX' i f(W i + X'{) = ?-EWf{W), 
and so condition (JJJ) will be satisfied. 
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Proof of Proposition I2.lt Substituting f(x) = x in (JJJ) yields, by ©, that 

n 

pa 2 = ^EX<(Wi + X?) 

n n 

i=l i=l 
n 1 n 

= ^ - £ ax? - i £>(x; - zf) 2 - £(jq[) a - E{x'>f} 

i=l i=\ 
1 " 



i=l 



so that Proposition (|2.1() follows. □ 

The following theorem, generalizing © of Lemma l2.H gives a coupling construc- 
tion for W and VF* which may be applied in the presence of dependence under the 
framework of Proposition 12.11 

Theorem 2.1 Let I be a random index independent of the X's such that 

n 

P(I = i)=vf/J2v]. 

3=1 

Further, for i such that V{ > 0, let X\, . . . , Xi-i,X-, X",Xi+i, . . . , X n be chosen 
according to the distribution 

dF n i (x\ , . . . , Xi—i , Xj , Xj , Xi-if-x , . . . , x ra ) 

— 5 < ^-^ri,i(^'l) ■ ■ ■ > li 2-1) > •Et+l) • • • > •En)* (10) 

Pui 

Then, with U a uniform U[0,1] variate which is independent of the X's and the 
index I, 

UXj + (1 - U)X" + Wi 

has the W -zero biased distribution. 

In particular, when X\, . . . ,X n are exchangeable, if one constructs exchangeable 
variables with distribution dF n> i which satisfy v\ > 0, and for i = 1, then 

UX[ + (1 - U)X" + Wi 

has the W -zero biased distribution. 

Proof of Theorem I2.lt With Proposition 12.11 we have 

' fQfri + x' T ) - ftfrj + x«)\ 



E [ f'(uX'j + (1 - u)X'i + W!)du = E 
Jo 



X'j - X'{ 



^ vf J f(W t + X>)-f(W l + X») 



-Ev] \ x>-x» 
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1 n 

=i-5 £ - X'DUm + X?) - f(Wi + XI)) 

^•3 i=l 



2(1 -p) 



- pEWf(W)} 
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EWf{W) 



= \EWf{W) 

= Ef'(W*), 

using Proposition (j2.1j) for the next to last step. 

To show the claim in the case where the variates are exchangeable, set dF n ^ = 
dF n ^i for i = 2, . . . ,n and observe that the dF n j so defined now satisfy the conditions 
of the theorem, and the distributions of the resulting UX[ + (1 — U)X" + W{ does 
not depend on i. 

□ 

Note that if the variates X±, . . . , X n are independent, one can generate the collec- 
tion X±, . . . , Xi-i, X[, X" , Xi + i, . . . , X n by letting X-, X" be independent replicates 
of Xi. In this case, conditions (jHJ), ©, and ((7J) above are satisfied, the last with 
p = 0, and the construction reduces to that given in JHJ) of Lemma l2.1| in view of 
© of that same lemma. 



3 Bounds in the Central Limit Theorem 

The construction of Lemma l2,l[ partly together with the following bounds of Bar- 
bour |5j and Gotze ^U] on the solution / of the differential equation (^Q) for a test 
function h with k bounded derivatives, 

ll/^H^OV'r 1 !!^!! j = i,-..,k, (12) 

yield the following remarkably simple proof of the Central Limit Theorem, with 
bounds on the approximation error, for independent possibly non-identically dis- 
tributed mean zero variables X\ , . . . , X n with variance 1 and common absolute first 
and third moments. 

By Lemma 12.11 part using independence, we can achieve W* having the 
W-zero biased distribution by selecting a random index / uniformly and replacing 
X[ by an independent variable Xj having the Xj-zero biased distribution. Now, 
since EWf{W) = a 2 Ef(W*), using the bound lfT2|>. 

\E{h(W/a)-$h}\ = \E{Wf'(W) - a 2 f"(W)}\ 

= a 2 \E{f"(W*)-f"(W)}\ 

< a 2 \\f®\\E\W* - W\ 

< -Lh^H^-XjI. (13) 

Now, using the bound E\Xf — Xj\ < E\Xj\ + E\Xj\ and the function x 2 sgn(x) 
and its derivative 2\x\ in equation Q , we derive -E^X*! = ^-ElXjl 3 , and therefore 
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E\X}\ = ^|Xi| 3 . Next, E\Xj\ = E\Xi\, and by Holder's inequality and EX? = 1, 
we have E\Xi\ < 1 < E\Xi\ 3 . Hence, since EW 2 = n = a 2 , 

| E{ftW .)- $ft} l<gW. 

Thus we can obtain a bound of order n -1 / 2 for smooth test functions with an explicit 
constant using only the first term in the Taylor expansion of f"{W*) — f"(W). For 
arbitrary independent mean zero variates, continuing from (|13|). for small additional 
effort the above inequality generalizes to 

The following theorem shows how the distance between an arbitrary mean zero, 
finite variance random variable W and a mean zero normal with the same variance 
can be bounded by the distance between W and a variate W* with the W-zero 
biased distribution defined on a joint space. It is instructive to compare the following 
theorem with Theorem 1.1 of [Sj, the corresponding result when using the size biased 
transformation. 

Theorem 3.1 Let W be a mean zero random variable with variance a 2 , and suppose 
(W, W*) is given on a joint probability space so that W* has the W-zero biased 
distribution. Then for all h with four bounded derivatives, 

\Eh(W/a) - $h\ < ^-Wh^W^EiEiW* - W\W) 2 } + ^\\h^\\E{W* - W) 2 . 

OCT oCT 

Proof. For the given h, let / be the solution to (^Q). Then, using the bounds in 
(|T2|) . it suffices to prove 



\E[Wf'(W)-a 2 f"(W)}\ < a 2 \\fW\\^E{E(W* - W\W) 2 } + ^ \\f^\\E(W* -W) 2 . 



By Taylor expansion, we have 
\E[Wf(W)-a 2 f"(W)}\ = \a 2 E[f(W*)-f"(W)]\ 



a 2 



< a 2 \Ef^(W){W* - W)\ + —\\f^\\E(W* - W) 2 . 
For the first term, condition on W and then apply the Cauchy-Schwarz inequality; 

\E[f®{W){W* - W)]\ = \E[f®{W)E(W* - W\W)]\ 

< ||/ (3) || y/E{E(W* - W\W) 2 }.U 

For illustration only, we apply Theorem 13. II to the sum of independent identically 
distributed variates to show how the the zero bias transformation leads to an error 
bound for smooth functions of order 1/n, under additional moment assumptions 
which include a vanishing third moment. 

Corollary 3.1 Let X, X\, X2, ■ ■ ■ , X n be independent and identically distributed 
mean zero, variance one random variables with vanishing third moment and EX 4 
finite. Set W = ^™ = i X^. Then for any function h with four bounded derivatives, 

\E {h(W/Vn~) -$h}\< n- 1 \~\\h®\\ + hh^\\EX 4 ) . 
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Proof: Construct W* as in Lemma ITT1 part (JSJ). Then 



E(W* - W\W) = E(Xj - Xi\W) = E{Xj) - E(Xj\W), 

since Xj and W are independent. Using the moment relation EX* = (1/2)EX S 
given in Lemma ITTI part (4), EXf = implies that EX* = 0, and so EXj = 0. 
Using that the X's are i.i.d., and therefore exchangeable, E(Xj\W) = W/n. Hence 
we obtain E(X* - Xi\W) = -W/n, and 



l E{E(X*-X I \W)*} = -=. 

V n 

For the second term in Theorem 13.11 

E(W* - Wf = E(X} - X/) 2 . 

The moment relation property Q in Lemma 12.11 and the assumption that EX 4 
exists renders E(X* - Xtf finite and equal to EX 4 /3 + EX 2 < (4/3)£X 4 , by 
EX 2 = 1 and Holder's inequality. Now using a 2 = n and applying Theorem 13.11 
yields the assertion. □ 
It is interesting to note that the constant p of equation (JJJ) does not appear in 
the bounds of Theorem 13 .11 One explanation of this phenomenon is as follows. The 
p of the coupling of Theorem 12. II is related to the A G (0, 1) of a coupling of Stein 
|15j . where a mean zero exchangeable pair (W,W'), with distribution dF(w,w'), 
satisfies £J{W'|W} = (1 — X)W. One can show that if (W,W') has distribution 

dP ^') = E fw~% 2 dF ^ 

then with U a uniform variate on [0,1], independent of all other variables, UW + 
(1 — U)W' has the lU-zero bias distribution. Taking simple cases, one can see that 
the value of A has no relation of the closeness of W to the normal. For instance, 
if W is the sum of n i.i.d. mean zero, variance one variables, then W is close to 
normal when n is large. However, for a given value of n, we may achieve any A of 
the form j/n by taking W' to be the sum of any n — j variables that make up the 
sum W, added to j i.i.d. variables that are independent of those that form W, but 
which have the same distribution. 

We only study here the notion of zero biasing in one dimension; it is possible 
to extend this concept to any finite dimension. The definition of zero biasing in 
R p is motivated by the following multivariate characterization. A vector Z G RP is 
multivariate normal with mean zero and covariance matrix S = (cr^) if and only if 
for all smooth test functions / : BP — > R, 

p p 

i=l *)i = l 

where fi,fij,--- denote the partial derivatives of / with respect to the indicated 
coordinates. Guided by this identity, given a mean zero vector X = (Xi, . . . ,X P ) 
with covariance matrix S, we say the collection of vectors X* = (X|-) has the X-zero 
bias distribution if 

p p 
E ]T Xifi{X) =EJ2 VijfijCX-tj), (14) 

8=1 *J = 1 
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for all smooth /. As in the univariate case, the mean zero normal is a fixed point 
of the zero bias transformation; that is, if X is a mean zero normal vector, one may 
satisfy (|T4*|) by setting X?- = X for all 

Using the definition of zero biasing in finite dimension, one can define the zero 
bias concept for random variables over an arbitrary index set H as follows. Given a 
collection {£(</>),</> £ TC} of real valued mean zero random variables with nontrivial 
finite second moment, we say the collection {C^,(p,ip £ T~t} has the £-zero biased 
distribution if for all p = 1,2,... and (fa, fa, ■ ■ ■ , fa) £ TC P , the collection of p- 
vectors (X|-) has the X-zero bias distribution, where, for 1 < i,j < p, 

( x *i) = {Q^ (<M, ■ ■ ■ ! Qi+j OM), 

and 

x = (afa),...^(fa)). 

Again when £ is normal, we may set = £ for all 0, ■0. This definition reduces to 
the one given above for random vectors when H = {1, 2, . . . , n}, and can be applied 
to, say, random processes by setting Tt = R, or random measures by letting TL be a 
specified class of functions. 

4 Application: Simple random sampling 

We now apply Theorem 13.11 to obtain a bound on the error incurred when using 
the normal to approximate the distribution of a sum obtained by simple random 
sampling. In order to obtain a bound of order 1/n for smooth functions, we impose 
an additional moment condition as in Corollary IH.1I 

Let A = {oi, . . . ., <2tv} be a set of real numbers such that 

5> = ^> 3 = 0; (15) 

the following is a useful consequence of l)15[). 

for any E C {1, . . . , N} and k £ {1, 3}, XX = ~XX- ( 16 ) 

aeE agE 

We assume until the statement of Theorem 14.11 that the elements of A are dis- 
tinct; this condition will be dropped in the theorem. Let < n < N, and set 
A^ n = N(N — 1) • • • (N — n + 1), the n th falling factorial of N . Consider the random 
vector X = (X\, . . . ,X n ) obtained by a simple random sample of size n from A, 
that is, X is a realization of one of the equally likely N n vectors of distinct elements 
of A Put 

W = X X + --- + X n , (17) 
Then, simply we have EX { = EX? = EW = EW 3 = 0, and 

aeA y ' aeA 

As we will consider the normalized variate W/a, without loss of generality we may 
assume 

E° 2 = 1 ; ( 18 ) 

aeA 
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note that (|T%|) can always be enforced by rescaling A, leaving ()15() unchanged. 

The next proposition shows how to apply Theorem 12.11 to construct W* in the 
context of simple random sampling. 

Proposition 4.1 Let 

dF ntl (x' l ,x'l,X2, ■ ■ ■ ,x n ) =N~+ 1 l({x[,x",X2,...,x n } C A, distinct), (19) 

the simple random sampling distribution on n + 1 variates from A, and X = 
(X[,X",X2, ■ ■ ■ ,X n ) be a random vector with distribution 

(x> — x"\ 2 

dF ntl (±) = v 1 2N l> (N - 2)~\l{{x' 1 ,x'l,X2, ...,x n }cA, distinct). (20) 

Then, with U a uniform [0, 1] random variable independent of X, and W\ given by 

W* = UX[ + (1 - U)X'l + Wi (21) 
has the W -zero biased distribution. 

Proof. We apply Theorem l2.1l for exchangeable variates. With X±, . . . , X n a simple 
random sample of size n, the distributional relation © is immediate. Next, using 
the scaling l|18j). we see that v\ given in (jSJ) equals 2/(N — 1), which is positive, 
and that furthermore, the distribution (|20j) is constructed from the distribution ()19|) 
according to the prescription lfTU|) . Lastly, using ((TdT) with k = 1, we have 

E{X[\X",X 2 , . . . ,,X n } = - f _ re 1 

and hence condition © is satisfied with p = —n/(N — n). □ 
We now begin to apply Theorem l3.1l bv constructing W and W* on a joint space. 

We achieve this goal by constructing the simple random sample X = (X\, . . . , X n ) 

together with the variates X = (X[,X", X2, ■ ■ ■ , X n ) with distribution as in 1)20(1 of 

Proposition 14.11 W and W* are then formed from these variates according to 1)17(1 

and (|2"T)l respectively. 

Construction of W and W*. Start the construction with the simple random 

sample X = (X±, . . . ,X n ). To begin the construction of X with distribution 1)20(1. 

set 

= m H{u,v}cA). 

Note that variates U, V with distribution q(u, v) will be unequal, and therefore we 
have that the distribution 1)20(1 factors as 

dF n ,i(x) = - 2)-\l({x 2 , ...,x n } CA\ distinct). (22) 

Hence, given (X[,X"), the vector (X2, ■ ■ ■ ,X n ) is a simple random sample of size 
n-1 from the N — 2 elements of A \ {X[,X'{}. 

Now, independently of the chosen sample X, pick (X[, X") from the distribution 
q(u,v). The variates (X[,X") are then placed as the first two components in the 
vector X. How the remaining n—1 variates in X are chosen depends on the amount 
of intersection between the sets {X2, ■ ■ ■ ,X n } and {X[,X'/}. If these two sets do 
not intersect, fill in the remaining n — 1 components of X with (X2, ■ ■ ■ ,X n ). If 
the sets have an intersection, remove from the vector (X2, ■ ■ ■ , X n ) the two variates 
(or single variate) that intersect and replace them (or it) with values obtained by a 
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simple random sample of size two (one) from A \ {X[, X", X2, • • • , X n }. This new 
vector now fills in the remaining n—1 positions in X. 

More formally, the construction is as follows. After generating X and (X[,X") 
independently from their respective distributions, we define 

R = \{x 2 ,...,x n }n{x{,x'{}\. 

There are three cases. 



Case 0: R = 0. In this case, set (X[,X?,X 2 , ...,X n ) = (X[,X'{, X 2 ,..., X n ). 

Case 1: R = 1. If say, X[ equals Xj, then set Xi = Xi for 2 < i < n,i ^ J and let 
Xj be drawn uniformly from A \ {X[ , X", X 2 , ■ ■ ■ , X n }. 

Case 2: R = 2. If X[ = Xj and X'{ = X K , say, then set Xi = Xi for 2 < i < 
n,i $l {J,K}, and let {Xj,Xk} be a simple random sample of size 2 drawn from 
-4\ {X[,X",X2, ■ ■ ■ ,X n }. 

The following proposition follows from Proposition 14. 1\ the representation of the 
distribution (|2l7|) as the product (|2*2*|) . and that fact that conditional on 
the above construction leads to sampling uniformly by rejection from A\{X[,X"}. 

Proposition 4.2 Let X = (Xi, . . . ,X n ) be a simple random sample of size n 
from A and let (X^^X'^) ~ q(u,v) be independent of X. If X%, . . . ,X n , given 
X[, X", X 2 , ■ ■ ■ , X n , are constructed as above, then (X[,X",X2, ■ ■ ■ ,X n ) has dis- 
tribution \2(l\). and with U an independent uniform variate on [0, 1], 

w* = ux' l + (\-u)x , ; + x 2 + --- + x n 
w = x 1 + --- + x n 

is a realization of (W,W*) on a joint space where W* has the W—zero biased dis- 
tribution. 

Under the moment conditions in (|15j) . we have now the ingredients to show that 
a bound of order 1/n holds, for smooth functions, for the normal approximation of 

W = Yn=x x i- First > define 



<*> = E 

aeA 



C x {N,n,A) = ^(^ + (6)Q 2 + /3 2 + 7 2 (n-l) 2 + r/ 2 ) 1/2 

and (23) 
C 2 (N,A) = H(4) + ^, (24) 

where a, /3, 7 and r\ are given in (|2*H|) . (|2*7|) . (|2*H|) . and (|2l?|) respectively. 

Theorem 4.1 Let X\, . . . ,X n be a simple random sample of size n from a set of 
N real numbers A satisfying Then with W = Y^l=\Xi, for all h with four 

bounded derivatives we have 

\Eh(W/a)-$h\ < ^-C 1 (N,n,A)\\h^\\ + ^C 2 (N,A)\\h^\\. (25) 

Further, if n — » 00 so that n/N — > / E (0, 1), then it follows that 

\Eh(W/a) - $h\ < n-^BtWh^W + B 2 ||^ 4 )||}(1 + o(l)), 



13 



where 



3 V 4 



/) 



+n 2 (6)+2 



1-/ 



1/2 



(/(I-/))' 



-1/2 



and 5 2 = i(lln(4)+45/)(/(l-/)) \ 



We see as follows that this bound yields a rate n _1 quite generally when values 
in A are "comparable." For example, suppose that Y%, Y 2 , ■ ■ ■ are independent copies 
of a nontrivial random variable Y with EY 6 < 00 and EY 2 = 1. If N is say, even, let 
the elements of A be equal to the N/2 values Yi/(2 J]f /2 ^/) 1/2 , • • • , Y N/2 /(2 J]f /2 Y?) 1 / 2 
and their negatives. Then, this collection satisfies (|15|) and (|18j). and by the law of 
large numbers, a.s. as N — ► 00, the terms n(4) and n 2 (6) converge to constants. 
Specifically, 

n(4) -> /£F 4 and n 2 (6) / 2 -Ey 6 , 

and so B\ and I?2 are asymptotically constant, and the rate 1/n is achieved over 
the class of functions h with bounded derivatives up to fourth order. 
Proof. Both Eh(W) and the upper bound in ()25j) are continuous functions of 
{ai, . . . , oat}. Hence, since any collection of N numbers A is arbitrarily close to a 
collection of N distinct numbers, it suffices to prove the theorem under the assump- 
tion that the elements of A are distinct. 

We apply Theorem 13.11 Constructing W and W* as in Proposition 14.21 and 
using standard inequalities and routine computations, one can show that 



Yw{E(W* - W\W)} < 2{Var^-l^j + Var(A)} 



where 



A 



and 



and that 



n — 1 



a 

7 
V 



N(N - n) 
-2(n-l) 



+ 



n — 3 



1 



JV(JV-n+l) N(N-n) N 
-2 



N(N -n)(N -n + 1) 
-iV + 3 



JV(JV-n)' 



(26) 
(27) 
(28) 
(29) 



Var(£{VT - W|W}) < C 2 (iV,n,^), 
where C\(N, n,A) is given in (|23|) . Again, straightforward computations give that 



E(W*-W) 2 < H^a 4 + ^ = C 2 (N,A). 



Details can be found in the technical report 
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